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1 .   INTRODUCTION 

In  a  previous  report  (Reference  1) ,  the  author  proposed 
the  use  of  sequential  (successive)  differences  as  an  aid  in 
identifying  outlier  data  points  and  in  selecting  the  appropriate 
order  polynomial  for  smoothing  of  3-D  data  on  torpedo  and  target 
paths.   In  this  report,  the  concept  of  successive  differences 
is  explored  and  developed  with  the  specific  intent  of  making  it 
suitable  for  inclusion  in  a  computer  program  for  smoothing 
3-D  data. 

The  nature  of  the  report  is  in  the  form  of  a  working 
paper  rather  than  a  polished  formal  report.   Some  of  the  dis- 
cussions presented  are  rather  lengthy  and  points  of  interest 
are,  perhaps,  belabored  and/or  repeated  unnecessarily.   The 
reader's  indulgence  is  invited  and  some  skimming  is  expected. 
Nevertheless  the  general  picture  appears  clear  and  the  possi- 
bility of  using  the  model  for  identification  of  outliers 
reasonable . 


DEVELOPMENT  OF  MODEL 
A.    General  Considerations 

For  the  purposes  of  this  analysis,  it  will  be  assumed 
that  an  observed  datum  x.   can  be  expressed  in  the  form 


x.  =  x(t.  )  =  P  (t.  )  +  n.  +  d. 

1         1        XI        1      1 


where  P(t)  is  a  polynomial  in  time  t,  n.  is  a  measurement 
error  which  will  be  called  "noise,"  and  d.  is  a  perturbation 
or  disturbance  which,  if  present  with  sufficient  amplitude, 

will  cause   x.   to  be  a  "wild"  datum  or  outlier. 

i 

It  will  be  assumed  that  each  component   (x,y,z)   of  a 
torpedo   (T)   or  target  (submarine,  S)  path  can  be  represented 
as  a  polynomial  of  some  low  degree   k   in  time   t.   (It  is 
suggested  that  the  restriction   k    4   be  incorporated  in  the 
smoothing  algorithm.)  Thus 


P  (t)  =  an  +  a,  t   +  a~t2  +  •  •  •  +  a,  tk 
x        0     1       2  k 


The  noise  component,  n.,  is  assumed  to  be  a  realization 

of  a  random  variable   N.   which  is  Normally  distributed  with 

2  2 

mean  0  and  cousnon  variance  a         (N.  ~  N(0,a  ))  and  it  is  also 

assumed  that  noise  components   N.   and   N.   at  times   t.   and 
t.   are  independent. 


Finally,  it  will  be  assumed  that  a  disturbance   d. 
should  have  fairly  rare  occurrence.   Evidence  of  the  existence 

of  a  non-zero  value  of   d.   can  be  obtained  from  examination 

1 

of  successive  differences  which,  when  sufficiently  high  order 

differences  are  considered,  are  functions  of  the   (n.  +  d.)'s 

11 

and   not  of    the      P(t.)'s.      Crossing  of   a    threshold  value    for 

1  ^ 

successive  differences,  which  is  seldom  crossed  when  no   d.'s 
are  present,  can  then  be  used  as  an  indication  of  the  presence 
of  a  disturbance   d.   and  hence  of  an  outlier  point.   Note 
that,  not  only  can  noise  only  cause  an  occasional  crossing 
depending  on  the  threshold  selected,  but  the  presence  of  a 
disturbance  may  not  cause  a  threshold  crossing  depending  on 
its  magnitude  and  its  interaction  with  noise.   This  will  be 
elaborated  as  the  development  of  the  model  progresses. 


B .   Successive  Differences 

A  definition  of  successive  or  sequential  differences 

suitable  for  our  purposes  is  presented  in  the  accompanying 

table  (Table  1)  and  the  notation  which  follows.   Since  the 

3-D  data  to  be  smoothed  involves  data  points  equally  spaced 

in  time,  this  has  been  incorporated  in  the  model.   Further, 

the  initial  time  for  any  data  segment  can  be  arbitrarily  set 

to  zero  for  model  development  hence   t_  =  0 .   Also,  selection 

of  the  common  time  interval  as  the  unit  of  time  yields 

t.  ,,  =  t.  +  1. 
i+l     l 
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The  selection  of  the  secondary  subscript   i   in  the 
ordered  differences  is  somewhat  arbitrary.   As  will  be  noted  when 
disturbances  are  introduced,  it  appears  desirable  for  computa- 
tional convenience  to  identify  the  even  ordered  differences 
(D~.   and   D. . )   with  the  observation   x.   for  each   i.   For 
example,  a  large  isolated  disturbance   d.   in   x.   will  produce 


an 


large  perturbations  in   D~  .   and   D.  .   hence  the  latter  c 
r  r  2i         4i 

be  used  to  identify   x.   as  an  'outlier.'   For  the  odd  ordered 

differences  (D,  .   and   D,.)   the  situation  is  not  as  clear. 

For  example,  if  a  large  perturbation  is  observed  in   D~  .   it 

is  not  clearly  evident  whether   x.   or   x.  ,   should  be  con- 
J  1       l-l 

sidered  as  the  'outlier.'  At  this  stage  in  the  development, 
it  would  appear  that  the  even  ordered  successive  differences 
should  be  the  primary  identifiers  of  'outliers.' 


C.   The  Polynomial  Component 

To  illustrate  the  contribution  of  the  polynomial  component 
to  successive  differences,  three  cases  (linear,  quadratic,  and 
cubic)  polynomials  are  presented  in  Tables  2.1,  2.2  and  2.3.. 
It  can  readily  be  seen  that  there  is  a  contribution  of  a 
polynomial  of  degree  k  to   D. .   for   j    k   but  that  for   j  >  k 
the  number   D..   represents  noise  only  unless  a  disturbance 
is  present.   Thus  detection  of  a  disturbance,  and  hence  identi- 
fication of  an  outlier,  becomes  simpler  if  a  sufficiently  high 
order  difference  can  be  used  and  the  polynomial  component 
eliminated. 
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TABLE    2.1.       SUCCESSIVE    DIFFERENCES 


Linear   Case:       x.    =    x(t.)    =   an    +   a,  t.    +   n. 
1  1  0  1    i  i 
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2 
Quadratic  Case:   x.  =  x(t.)  =  aA  +  a,t.  +  a~t.  +  n. 
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The  question  of  how  high  the  order  of  the  difference  must 
be  to  eliminate  the  polynomial  component  is  not  clear-cut.   As 
a  matter  of  fact,  the  polynomial  component  does  not  have  to  be 
eliminated  entirely  for  a  particular  order  of  successive  differ- 
ences to  be  used  to  identify  outliers.   It  is  sufficient  that 
the  contribution  of  the  polynomial  component  P . .   be  small  with 
respect  to  the  noise  component  N . .   for   D . .   to  be  useful  as 

an  indicator  of  a  disturbance   d.   in   x. . 

1        1 

(This  is  intimately  related  to  the  problem  of  fitting 
polynomials  to  segments  of  a  torpedo  path.   If  (1)  torpedo  path 
does  not  change  too  radically,  (2)  the  length  of  the  path  segment 
to  be  fitted  is  short  enough,  and   (3)  the  data  rate  is  high 
enough,  then  low  order  polynomials  can  provide  satisfactory 
approximations  to  the  path.   In  Reference  1,  path  segments  of 
21  and  11  points  were  explored  briefly.   Path  segments  consisting 
of  7  points  has  been  suggested  but  not  examined  as  yet.   In 
many  of  these  segments  examined  polynomials  of  order  k    3 
produced  acceptably  small  and  apparently  random  residual  errors 
for  11  point  segments.) 

From  Tables  1,  2.1-2.3  it  can  be  seen  that  a  successive 
difference   D..   of  order  j  involves   j+1   successive  observations 
x.  .   For   j  <_  4 ,  as  proposed  for  screening  for  outliers,  at  most 
five  data  points  are  involved.   These  can  be  fitted  reasonably 
well  by  polynomials  of  order   k    3.   Supporting  evidence  for 
this  is  available  in  the  successive  differences  for  the  3-D 
data  on  the  torpedo  run  examined  in  this  study.   Discussion  of 
the  analysis  justifying  this  contention  will  be  presented  in 
a  later  section. 


An  alternative  has  been  suggested.   It  incorporates 
control  information  (information  obtained  by  alternate  means  on 
the  command  and  control  of  a  torpedo)  to  provide  appropriate 
values  for  the  polynomial  coefficients  and  to  indicate  appro- 
priate polynomial  order  for  fitting  data.   In  the  linear  case 
this  information  should  be  in  the  form  of  a  specific  value  or 

bound  for   a,  .   Since   a,  =  |V|  cos  9,  as  illustrated  in  the 

■*■  .-*- . 

the 


accompanying  sketch  with   V   a  velocity  vector  and   |V 
magnitude  of  V,  one  possible  value  for   a,   would  be   a,  <_    |V| 

y 


►x 


This  will  be  shown  to  dominate  the  noise  component   N.  .   for 

3-D  data.   Information  from  control  data  on   6   could  be  used 

* 
but  would  require   a,   (and  hence  the  threshold   D, )   to  be 

treated  as  a  function  of  position  on  the  torpedo  path  and 
hence  as  a  function  of   t. .   For  the  purpose  of  preliminary 
screening  for  outliers,  it  would  appear  preferable  to  concen- 
trate on  successive  differences  of  sufficiently  high  order 

that  the  polynomial  component  can  be  considered  negligible. 

* 

With    this    constraint,    a   constant    threshold      D.       can   be    used 

3 

for  all  successive  differences   D..   of  order  j. 
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D.   The  Noise  Component 

When  the  polynomial  component   P . .   has  been  eliminated, 
attention  can  be  concentrated  on  the  noise  component  n . .   of 
the  jth  order  successive  differences.   In  engineering  parlance, 
the  problem  of  identifying  outliers  can  now  be  considered  as 

one  of  detecting  a  signal  (a  disturbance   d.)   in  the  presence 

* 
of  noise   (n  .  . )  .   The  thresholds   D.   can  be  expressed  as  specified 
31  1 

levels  of   D  .    which  are  seldom  exceeded  by  noise  only  and  hence 

which  indicate  the  presence  of  a  disturbance   d. .   In  order  to 

1 

establish  values  for   D.,  a  statistical  analysis  of  the  noise 
component  is  required. 

Recall  the  assumptions  in  Section  2. A  that  the  noise 

component   n.   is  a  realization  of  a  random  variable   N.   with 

2 

N.    ~   N(0,a    )       and    that      N.       and      N.      are    independent    for      i   4    j. 
1  13  r  J 

It   can   be    established    from    the    definitions  of  successive    differ- 
ences   that    the    noise    component      N..       of      D..       can    be    defined 

31        31 

in  terms  of  the  noise  components   n.   of   x.   as  follows: 

n,  .  =  n .  -  n .  . 
li     1     l-l 

nn.  =  n  .  ,  ,  -2n.  +  n .  , 
2i     l+l      1     l-l 

n-,.  =  n.,,  -  3n .  +  3n  .  ,  -n.  - 
3i     l+l      1      l-l     i-2 

n„  .  =  n.  ,.  -  4n.  ,,  +6n.  -4n.  ,  +n.  ~,  . 
4i     i+2      l+l      1      l-l     i-2 


II 


Each  of  these  noise  components  have  mean  0  since  the   n. 's 
are  assumed  to  have  mean  0 . 

The  variance   V.   of   N..   can  be  expressed  in  terms  of 

2 

the  common  variance   a    of  the   n. 's  using  the  independence 
property  of  the   n.'s.   These  are  presented  below  together 
with  some  of  the  covariances   C(n..,k,  )   of  interest  later. 

st 

1    Order  Noise  Differences   (N, . ) 


„  2 
Vl   =  2  a 

C(nli'  nl,i+l)  "  "°2 


2        Order  Noise    Differences       (N?.) 

V?_    =    6a2 
C(n2i'    n2,i+l}    =    "4q2 
C(n2i'    n2,i  +  2}    =    °2 


3        Order  Noise    Differences       (N-.) 

V3   =    20a2 

C(n3i'    n3,i+l)    =    -12°2 
C(n3i'    n3,i+2)     =         6q2 


4        Order  Noise    Differences       (N. . ) 

V4    =    70a2 

C(n4i'    n4,i+l)    =   "56q2 
C(n4i'    n4,i+2}    =      28q2 
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Selected  Covariances 


C(n2.,  n3. 

C(n2i,  n3,i+l 
C(n2i,  n4. 

C(n3.,  n4f± 
C(n3,i4-1'  n4i 


=   10a' 


=  -10a' 


=  -20a 


=  -35a' 


=   35a 


Since  all  the   N..'s  are  normally  distributed  with  mean  0, 
it  can  be  established  that 


P(  |N  .  .  I  I   3  /vT)  =  0.99  7 


If  we  set   D.  =  3  /V.   then,  for  applications  in  which  the  poly- 
nomial contributions  to   D.    have  been  eliminated,  there  will 
be,  on  the  average,  less  than  one  time  in  200  independent  trials 
in  which  the   I D .  .    will  exceed   D  .   due  to  noise  alone .   The 
suggested  thresholds  for  detection  of  disturbances  are  given 
below. 


j 

0 

1 

2 

3 

4 

* 
D. 

3a 

4.24a 

7.348a 

13.416a 

25.10a 

*  2 

The  term   D.   with   j  =  0   corresponds  to   V.  =  a    (i.e., 

the  variance  of  N.   and  hence  of   x.   when  no  polvnomial 

1  1  ^ 

is  involved)  . 
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The  suggested  thresholds  are  worth  some  further  exploration 
As  an  oversimplified  case  consider  a  situation  in  which  no  poly- 
nomial contributions  are  involved,  n,  =  3a   for  some   k,  and 


n.  =0   for   i  4   k.   The  relationships  of  the 

l 

* 
D.'s   are  shown  in  the  following  table. 

3 


D.,   's  to  the 


j 

0 

1 

2 

3 

4 

D  .,  =  n  .. 
jk     jk 

3a 

3a 

-6a 

-9a 

18a 

* 

Dk 

3a 

4.24a 

7.35a 

13.4a 

25.1a 

!nlkl/D* 

1 

.70  7 

.816 

.671 

.717 

Since   |n-,  !/D-   is  greater  than  the  corresponding  expression 
for   j  =  3   or   j  =  4,  it  could  be  anticipated  that  the  second 
order  differences  (the  D~. 's)  might  be  better  detectors  for  dis- 
turbances when  the  polynomial  contribution  is  linear.   This 
will  be  demonstrated  for  an  isolated  disturbance  in  a  later 
section  of  this  report. 

The  type  of  information  to  be  seen  in  the  special  case  of 
an  isolated  noise  element   n,   can  be  generalized.   The  co- 
variances  are  useful  for  this  purpose.   Note  that,  comparing 
the  special  case  to  the  covariances, 
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Special  Case 


Covariance 


D2k  =  -6a 


D2,k+1=  3o 


C(n2i'  n3i+l}  =  '4a 


D2k  =  -6a 


D3k  "  ~9a 


C(n2 . ,  n-.  )  =  +10a' 


D2k  =  -6a 


D4k  =  18a 


C(n2.,  n4. 


=  -20a' 


This  relationship  can,  perhaps,  be  made  clearer  by  considering 
the  correlation  coefficients.   For  example, 


r(n2.,  n4.)  = 


C(n2i,  n4i 


-20a' 


/V2V4 


(6a2) (70a2) 


=  -0.976 


The   other   correlation    coefficients    of    interest   here    are 


and 


r(n2i'    n2,i+l 


r(n2i,    n3. 


r(n3i'    n3,i+l 


r(n3i,    n4i 


r(n4i'    n4,i+l 


-4 


=    -0.667    , 


10 


vT20 

-12 
20 

-35 


=    0.913    , 


VT400 
-56 


=    -0.6     , 

=    -0.9  35    , 


70 


=    -0.3 
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These  can  be  interpreted  as  follows.   In  general,  if   n2-   has 
a  large  value,  then  n0    .  ,   and  n . .   can  be  expected  to  have 
fairly  large  values  of  the  opposite  sign  and  n_.   a  fairly 
large  value  of  the  same  sign.   The  importance  of  this  in  detect- 
ing outliers  is  that  the  information  provided  by  different 
orders  of  differences  at  the  same  point  and  by  differences  of 
the  same  order  at  adjacent  points  is  primarily  of  a  confirmation 
nature  rather  than  providing  complementary  information.   This 
can  be  interpreted  to  the  more  practical  statement  that,  for 
example,  if  a  disturbance  in   x.   which  does  not  cause  a  cross- 
ing  of   D.   by   D.  .,  then  it  will  usually  not  cause  a  threshold 

crossina  by   D~  .  ,  D-, .  ,  D,    ,   or   D„  .,  ,  .   On  the  other  hand, 
1         2i    3i    4,i-l        4,1+1 

* 

if   D.  .   exceeds   D.   in  magnitude,  then  one  or  more  of  these 
4i  4        -a       ' 

other  differences  has  a  reasonable  chance  of  crossing  its  pre- 
scribed threshold. 

As  a  consequence  of  the  complementary  nature  of  threshold 

crossings  and  of  the  fact  that   D. .   is  less  likely  to  be  con- 

^  4i  - 

taminated  by  a  polynomial  component,  it  is  suggested  that  the 
testing  for  outliers  be  performed  by  testing  only  fourth  order 

differences  (the   D..'s)  for  crossing  of  the  appropriate 

* 

threshold   D. . 
4 

Before  considering  the  disturbance  component  of   x. , 
it  would  be  of  interest  to  consider  the  relative  magnitudes 
of  polynomial  and  noise  components  of  3-D  data.   Of  particular 
interest  here  is  the  comparison  of   a,   with   D,   since  these 
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are  the  vital  components  if  the  first  order  differences  are  to  be 
used  for  detecting  outliers.   Since   a,  =  |v|  cos  8,  it  can  be 
seen  that   a,   achieves  its  maximum  magnitude  when   9=0°   or 
6  =  180°  .   A  plot  of  the  path  of  the  torpedo  in  the  torpedo  run 
selected  for  examination  in  this  study  and  the  corresponding  data 
together  with  the  first  four  orders  of  differences  are  presented 
in  Appendix  A.   It  can  be  seen  that   9=0°   occurs  in  the 
vicinity  of   t  =  950   and   9  =  180°   occurs  in  the  vicinities 
of   t  =  807,  853,  and  917.   An  approximate  value  of   |v|   is 
satisfactory  for  the  present  purposes  and  the  value   |v|  =  95 
will  be  used. 

Establishment  of  a  bound  for  the  noise  in  the  form  with 

P(  |N,  .  I  >  3a,T  )  <  0.01  , 
li       N, 

2       2  2 

with   a    =  2a  ,  requires  estimation  of   a  ,  the  noise  variance.   In 

Nl 
Reference  1,  estimates  of  a      as  low  as  2  or  3  were  obtained 

for  selected  segments  of  the  torpedo  run  to  be  used  here.   It 

will  be  assumed  for  this  examination  that  a   =  4  and  hence  that 

a    =  5.65  6   and  hence   3a    =  17. 

Nl  Nl 

Boundary  for   D, .   can  then  be  set  in  the  form 
1  li 


D*  =  +  [|V|  +  3aM  ]  =  +  112. 
1    -  Nl     " 


Thus,  only  if   D,  .   were  greater  than  +  112  or  less  than  -112 
would  a  disturbance  be  indicated.   Using  the  formula 

D.  =  |v|  cos  9  +  3a  „ 
J  ~   Nl 

17 


when 


is  given  we  have 


8 

* 

Lower  threshold 

Upper  threshold 

0° 

90° 

180° 

95  -  17  =  78 
-17 

-95  -  17  =  -112 

95  +  17  =  112 
+  17 
-95  +  17  =  -78 

It  can  be  seen  that  detection  of  disturbances  in  the  first  order 
differences  unless   6=0    or  180   will  not  be  reliable  when 
a  general  threshold  of  the  form 


D,  =  +  [  Ivl  +  3a 
1    —  n. 


is  used 


2  •   The  Disturbance  Component 

The  presence  of  a  disturbance  or  perturbation  in  an  obser- 
vation  x.   can  be  represented  as  an  additional  component   d. 
so  that 

x.  =  x(t. )  =  P( t. )  +  n.  +  d.  . 
l       l        ill 

There  are  several  types  of  perturbations  that  could  be  considered 

One  of  these,  an  'outlier'  or  isolated  disturbance   d.   that 

i 

occurs  in  only  one  observation   x. ,  is  the  simplest.   The  effects 


13 


of  such  a  disturbance  is  shown  in  Table  3.1  and  the  accompany- 


ing sketch,  Figure  3.1.   In  the  sketch  both   d 

are  expressed  in  terms  of  the  parameter   a  (the  standard  deviation 


and  the   D . ' s 
3 


of  the  noise  component  n.) .   The  value   d  =  5a   is  used  for 
illustrative  purposes.   Also  note  that  the  ordinate  is 


x '.  .  =  x  .   -P..-n.   =  d  . 
31     :i     31     Di     31 


and  hence  represents  only  the  disturbance  component  of   x... 

There  are  several  features  of  the  successive  differences 

that  should  be  noted  when  an  isolated  perturbation  occurs.   First, 

consider  an  observation   x.   (in  our  example   x.  =  4)   consisting 

of  an  isolated  disturbance   d  =  k j  without  any  noise   (n  .   =0 

31 


fo 


r  all   j   and   i)  and  with  polynomial  component   P(t.)  =  an+a,t. 


The  values  of   k   for  which  the  thresholds   (D_.'s)  are  achieved 

3 

are  shown  below. 


3 

2 

3 

4 

°34 

2ka 

3k  a 

6ka 

* 

D. 

3 

7.35a 

13.4a 

25.1a 

Critical   k 

3.675 

4.467 

4.183 

In  the  absence  of  the  noise  and  polynomial  components, 
the  second  order  difference   D2 •   will  provide  a  threshold  cross- 
ing for  a  smaller  isolated  disturbance  (d    3.675a)  than  either 
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the  third  order  difference   (d   4.4  76a)   or  the  fourth  order 
difference   (d    4.183a)   and   D. .   is  slightly  better  than   D,. , 
If  assurances  could  be  given  that  the  polynomial  component  was 
no  higher  than  the  first  degree,  then  the  second  order   differ- 
ences (the   D-.'s)  would  appear  to  provide  the  most  sensitive 
location  to  test  for  isolated  disturbances.   If  polynomial 
components  of  the  second  or  third  degrees  are  possible  then 
the  fourth  order  differences  (the   D. . 's)  appear  to  be  prefer- 
able for  testing. 

Next,  consider  the  pattern  or  signature  produced  in  the 
ordered  differences  by  an  isolated  disturbance  at   t  .   Both 

Dn.   and   D,.   will  contain  their  maximum  contributions  from 
2i         4i 

the  disturbance  at   D,    and   D.    (of  opposite  signs)  and  both 

will  have  substantial  but  smaller  contributions  of  opposite 

signs  at  the  adjacent  points   (D~    ,   and   D9    ,   and 

D.    ,   and   D,   ,,).   The  third  order  differences  (the   D-,.'s) 
4,r-l         4,r+l  3i 

will  have  contributions  of  equal  magnitudes  but  opposite  signs 

at  adjacent  positions   (D-.     and   D^    ,  )   and  smaller  contri- 

butions  at  the  next  positions.   Incorporation  of  their  signatures 

although  clearly  recognizable,  in  the  graph  (see  Fig.  3.1) 

would  be  difficult  to  incorporate  in  a  program  for  automatic 

computer  filtering  of  outliers. 

The  last  item  for  discussion  of  isolated  disturbances 

pertains  to  the  addition  of  noise  and  disturbance  components. 

Consider,  now  a  disturbance   d  =  5a   in   x    (x.  in  Table  3.1) 

r    4 
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and  its  effect  on   D.    in  the  presence  of  noise.   A  positive 

* 

value  of   n.    will  enhance  crossing  the  threshold   D.   so 
4r  r  4 

attention  can  be  directed  to  the  effects  of  negative  values 
for  n4r.   If 

n,   <  -(30a  -  25.1a)  =  -4.9a( — J 

IF  a  /         n4i 

* 

then   D„    will  not  cross  the  upper  threshold   D.  =  25.1a.   For 
4r  4 

this  situation  the  probability  of  a  threshold  crossing  is 

* 

P(N.       >    D.)    =       .721.       In    this    event      n.         ,       and      n,       ,,       will, 
4n  4  4,r-l  4,r+l 

in    general,    be    positive    since 


r(n4i'    n4,i-l)    =    ri(n4i'    n4,i+l)    =    "°-8  (Section    C] 


and  hence  neither   D.    ,   nor  DA         ,   can  be  expected  to  cross 

* 

the  lower  threshold   D.  =  -  25.1a.   Also,  as  a  consequence  of 
r(n~.,  n4 . )  =  -0.9  76,  a  negative  value  for   n.    can  be  expected 
to  be  accompanied  by  a  positive  value  for   n~    and  hence   D_ 


will  not  cross  the  lower  threshold   D-  =  -7.35a.   Further, 

since   r(n~.,  n-  .,,)  =  -0.667,  neither  D~    ,  nor   D~   ., 
zi    ^  l+i  z.  ,  r— l       z  ,  r+i 

* 

can   be   expected   to   cross    the    upper    threshold      D-    =   +7. 35a. 

Similarly    the    correlations      r(n,.,    n. . )    =   -0.9  35      and 

r(n_.,    n_       ,,)    =    -0 . 6      make    it    unlikely    that   either      D-,         or 
3i         3,1+1  1  3r 

* 

D-.       ,,      will   cross    the    lower    threshold      D_.    =    -13.4a      or    the 
3,r+l  3 

upper   tjnreshold      D-.    =  +  13.4a,    respectively. 
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TABLE  3.1 


SUCCESSIVE  DIFFERENCES 


Linear  Case:   Isolated  Disturbance   d 


t. 

1 

x . 

l 

Dlx 

D-  . 
2i 

D3x 

D4i 

0 

ao  +  no 

al 

+  nu 

1 

aQ  +   ax  + 

nl 

al 

+  n12 

n21 

n32 

2 

aQ  +  2^    + 

n2 

al 

+  n13 

n22 

n33  + 

d 

n42  +   d 

3 

a-.  +  3a,  + 

U       1 

n3 

n23  +   d 

n43  "  4d 

al 

+  n14  + 

d 

n34  " 

3d 

4 

aQ  +  4ax  + 

n4  +  d 

n24  -  2d 

n44  +  6d 

al 

+  n15  - 

d 

n35  + 

3d 

5 

aQ  +  5ax  + 

n5 

al 

"  n16 

n25  +   d 

n36  " 

d 

n45  -  4d 

6 

an  +  6a,  + 

n6 

al 

+  n17 

n26 

n37 

n46  +   d 

7 

aQ  +  7ax  + 

n7 

al 

+  n18 

n27 

8 

an  +  8a,  + 

n8 
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x!. 

2X 


30a  I 
25a 
20a 
15a 


10a 


5a 


-5a 


•10a 
•15a 
•20a 


■25a 
•30a 


/ 


-i — * — »- 


d  =   5a 


— 


y 


\       i 


\l 


*■     x 


->    t 


Thresholds 


J  N 

*  J 

D4   =    25.1a 


D     =   13.4a 


D„   =      7.35a 


-D, 


-D. 


-D, 


FIGURE    3.1 
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The  proposed  use  of  only  one  order  of  successive  differ- 
ence (namely,  D . . )  to  test  for  outliers  appears  reasonable  for 
isolated  disturbances.   If  D.        exceeds  its  threshold  then  this 
will  will  usually  be  accompanied  by   D9     and   D-.    exceeding 
their  thresholds  in  the  opposite  direction. 

Attention  can  now  be  directed  to  disturbances  other  thai 
isolated  ones.   Consider,  next,  a  situation  involving  distur- 
bances  d.   and  d,   in  two  observations.   For  simplicity,  it 
will  be  assumed  that  they  have  the  same  magnitude,  d,  but  can  di 
in  sign  and/or  location.   The  situation  with  two  adjacent  dis- 
turbances of  the  same  sign  is  presented  in  Table  3.2  and  Figure 
3.2.   Note  that  the  magnitudes  of  the  contributions  of  the 

disturbances  to   D,.   and   D.c   (D„    and   D.   ,.   for  equal 

44         45     4r         4 , r+1        M 

disturbances  in   x    and   x  , , )   is  substantially  reduced  from 

r        r+1  2 

that  in  case  of  an  isolated  disturbance  as  is  the  contributions 

to  the  next  adjacent  observations.   It  is  evident  that  large 

adjacent  disturbances  of  the  same  sign  will  be  less  likely  to 

cause  threshold  crossings  .   Note  that  a  large  noise  component 

in  one  observation  (n,  ,  for  example)  will,  in  general,  be 

4  r 

accompanied  by  a  large  noise  component  of  the  opposite  sign 
(r(n..,  n.  .  ,)  =  -0.8)   in  the  other  observation  and  hence 
enhance  the  probability  of  a  threshold  crossing  by  one  of  the 
differences   D4    or   D4    , .   In  general,  two  adjacent  large 
values  of  the  same  sign  in   D_ .   or   D. .   is  a  signature 
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TABLE  3.2 
SUCCESSIVE  DIFFERENCES 

Linear  Case:   Adjacent  Equal  Disturbances 


t. 

1 

X. 

l 

Dli 

2i 

D3i 

D4i 

0 

ao  + 

no 

al  +  nll 

1 

ao  + 

al 

+  nx 

al  +  n12 

n21 

n32 

2 

a0  + 

2al 

+  n2 

al  +  n13 

n22 

n33  +  d 

n42  +  d 

3 

a0  + 

3al 

+  n3 

al  "  nl4  +  d 

n23  +  d 

n34  "  2d 

n43  "  3d 

4 

ao  + 

4al 

+  n  .  +  d 
4 

al  +  nl5 

n24  "  d 

n35 

n44  +  2d 

5 

a0  + 

5al 

+  n5  +  d 

al  +  n16  "  d 

n25  "  d 

n-c  +  2d 
36 

n45  +  2d 

6 

ao  + 

6a, 

+  n6 

al  +  n17 

n26  +  d 

n37  "  d 

n46  "  3d 

7 

ao  + 

7al 

+  n7 
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of  adjacent  disturbances  of  the  same  sign.   (The  possibility 
of  using  reduced  thresholds  for  this  situation  has  not  been 
explored.)   The  magnitudes  of  the   D^.'s  are  also  smaller  than 
in  the  single  disturbance  situation  and  are  separated  by  an 
observation   (D-,-)   involving  noise  only. 

Next,  consider  adjacent  disturbances  of  equal  magnitudes 
but  opposite  signs.   This  situation  is  presented  in  Table  3.3 
and  Figure  3.3.   The  additive,  or  magnification,  effect  of  the 
opposing  signs  should  make  even  moderate  magnitudes  of  the 
disturbances  readily  detectable.   The  pattern  or  signature 
should  be  clearly  evident.   It  is  suspected,  however,  that  the 
occurrence  of  this  situation  in  real-life  data  would  be 
extremely  rare  in  comparison  to  the  previous  situation. 

The  situation  in  which  two  disturbances  of  similar 
magnitude  and  sign  separated  by  one  unperturbed  data  point  is 
presented  in  Table  3.4  and  Figure  3.4.   From  the  graph  it  can 
be  seen  that  this  situation  looks  much  like  a  situation  with 
a  single  isolated  disturbance  of  somewhat  greater  magnitude 
and  opposite  sign  (Fig.  3.1) .   This  brings  the  danger  that 
the  observation   x,   (between  the  two  observations  with  dis- 
turbances) could  be  erroneously  labeled  as  an  outlier  and  hence 
removed  and  treated  as  a  missing  point.   In  the  next  section 
missing  points  and  their  replacement  by  the  average  of  the 
observations  on  each  side  of  the  missing  point  will  be  discussed 
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TABLE  3.3 
SUCCESSIVE  DIFFERENCES 

Linear  Case:   Adjacent  Opposed  Equal  Disturbances 
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TABLE  3.4 
SUCCESSIVE  DIFFERENCES 

Linear  Case:   Two  Disturbances  Separated  by  One  Point 
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This  treatment  would  introduce  the  disturbance    d   in  the  new 
value  for   x-   and  hence  to  three  adjacent  equal  disturbances. 
The  latter  situations  presented  in  Table  3.5  and  Figure  3.5. 
Note,  first,  that  removal  of  an  observation  and  replacement  of 
the  missing  point  should  be  followed  by  recalculation  of  the 
ordered  differences  affected  and,  second,  that  the  magnitudes 
of  the  contributions  of  the  disturbances  to  the  ordered  differ- 
ences are  substantially  reduced  from  the  contributions  in  eithe: 
the  isolated  disturbance  situation  or  the  separated  distur- 
bances situation.   In  this  modified  situation  the  reduced 
thresholds  presented  in  the  next  section  will  improve  the 
capability  of  indicating  the  presence  of  the  two  separated 
disturbances.   A  threshold  crossing  bv  any  of  the   D. . 's  with 
i  =  3,  4,  5,  6   in  the  modified  results  should  serve  as  an 
indicator  that  disturbances  may  be  present  in   x.   and   xfi 

rather  than  in   x^ . 

5 

In  addition  to  the  occurrence  of  three  adjacent  and 
equal  disturbances  in  the  treatment  of  two  such  disturbances 
by  replacing  missing  points,  it  is  possible  that  this  situation 
can  occur  due  to  the  persistence  of  the  perturbation  causing 
the  disturbances .   The  lower  disturbance  contributions  to  the 
ordered  differences  could  readily  fail  to  produce  a  threshold 
crossing  as  could  the  situation  with  two  adjacent  equal  dis- 
turbances whereas  the  situation  with  an  isolated  disturbance 
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TABLE  3.5 
SUCCESSIVE  DIFFERENCES 

Linear  Case:   Three  Adjacent  Equal  Disturbances 
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of  the  same  magnitude  would  yield  a  threshold  crossing.   These 
situations  with  more  than  one  adjacent,  equal  disturbances  may 
require  greater  consideration  of  the  signatures  identifying 
them.   (See  Figures  3.2  and  3.5.)   Such  modifications  are  not 
examined  further  in  this  report. 

For  the  present,  it  will  be  assumed  that  successive 
differences  will  be  incorporated  in  a  data  smoothing  algorithm 
for  the  two  purposes  discussed  in  the  introduction  (Section  I) , 
namely,  identifying  outliers  and  indicating  appropriate  order 
polynomials  for  fitting  the  data.   There  are  two  ways  that 
sequential  differences  can  be  used  in  identifying  outliers. 
One  is  as  a  preliminary  screening  to  remove  some  of  the  more 
obvious  outliers  to  be  followed  by  a  reexamination  for  outliers 
in  the  curve  fitting  portion  of  the  data  smoothing  algorithm 
as  presently  incorporated  in  the  general  track  smoothing  program 
MASM3DRJ.   The  other  approach  would  require  sequential  differ- 
ences to  provide  the  only  means  of  identifying  outliers.   As 
indicated  by  the  comparatively  simple  situations  considered 
here,  this  would  require  considerably  more  modal  development 
and  become  a  considerably  large  portion  of  a  data  smoothing 
program.   For  the  purposes  of  this  report,  the  first  approach 
will  be  considered  appropriate. 

A  situation  with  two  equal  disturbances  separated  by 
two  unperturbed  observations  is  presented  in  Table  3.6  and 
Figure  3.6.   It  should  be  observed  that  when  disturbances  are 
separated  by  as  few  as  two  points  they  can  be  considered  essen- 
tially as  isolated  disturbances.   (See  Table  3.1  and  Figure  3.1.) 
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TABLE  3 

.6 

SUCCESSIVE  DIFFERENCES 

Linear  Case: 

Two 

Equal  Disturbances  Separated 

by  Twc 

i  Points 
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There  are  other  types  of  perturbations  that  could,  and 

possibly  should,  be  considered  for  potential  identification  by 

successive  differences.   Only  one  of  these  will  be  examined  here 

This  is  the  situation  in  which  the  torpedo  changes  from  a 

linear  path  at   t    to  a  different  linear  path  at   t  , , .   This 
r         r  r         r+1 

situation  is  presented  in  Table  3.7  and  Figure  3.7.   As  can  be 
seen  by  comparison  of  Table  3.7  with  Table  3.1,  it  is  possible 
that  a  path  change  at   t  =  r   could  lead  to  the  identification 
of   x    as  containing  a  disturbance   d   depending  on  the  magnitu 
of   A,   and   d.   The  resemblance  of  the  signature  (graph)  of 
D, .   in  the  two  situations  could  be  even  more  striking  for  a 
value  of   d   such  that   D.  2   of  Table  3.1  (corresponding  to 
D.   _-   of  Table  3.7)  were  small  enough  to  be  submerged  in  noise 
and   A   =  3d.   That  a  path  change  could  conceivably  cause  a 
threshold  crossing  of   D*   by   D4    can  be  seen  in  the  case  of 
a  90°  change  from   6=0   to   9'  =  90   (or,  vice  versa)  where 
I  A,  |  =  |v|  =90.   The  situation  is  even  worse  for  a  90° 
change  from   9  =  45°   to   9'  =  135°   with  |  A.,  |  =  1.4(90)  =  126. 
Possible  methods  of  identifying  path  changes  to  prevent 
mis-identification  as  outliers  include  reconsideration  of 
labeled  outliers  after  fitting  curves  to  the  data  and  provision 
from  an  external  source  such  as  control  information.   The  first 
method  requires  greater  complication  of  the  data  smoothing 
program  involving  cycling  and  hence  negates  the  intent  of  a 
simple  screening  program  for  outliers .   The  second  requires 


TABLE  3.7 


Linear  Case:   Path  Change  at   t.  =  r 
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input  information  from  another  source  and  is  also  undesirable 
but  to  a  lesser  extent.   An  alternative  treatment  is  to  accept 
such  identification  of  point  of  path  change  as  providing  an 
outlier  to  be  removed  from  the  data.   The  consequences  of  this 
treatment  will  be  examined  in  a  subsequent  report  on  curve  fitting 
and  appears,  at  least  for  the  present,  to  be  a  reasonable  way 
of  handling  the  situation. 

There  is  still  another  kind  of  perturbation  which  can, 
and  has  been  observed  to  occur.   This  is  a  change  in  the  noise 
component  and  represented  by  a  change  in  the  value  of  the 
standard  deviation  a  .   Such  changes  may  be  a  result  of  changes 
in  the  environment  or  of  the  data  gathering  system.   Evidence 
of  such  changes  in  the  value  of  a   should  be  accommodated  by 
corresponding  changes  in  the  threshold  levels. 


F .   Missing  Points 

The  occurrence  of  missing  observations  in  a  sequence 
of  observations  needs  some  consideration.   A  missing  observation 
can  be  present  in  the  data  input  or  occur  as  a  result  of  deletion 
of  an  outlier.   Note  that,  in  the  latter  case,  recalculation  of 
successive  differences  will  be  required  in  the  vicinity  of 
the  deleted  observation. 

As  the  simplest  procedure  for  replacing  missing  points, 
the  currently  used  procedure  of  averaging  over  the  adjacent 
points  will  be  used  here.   (This  also  will  be  re-examined  when 
curve-fitting  is  discussed.)   Thus,  when   x    is  missing  it 
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it  will   be    replaced  by 


X'   =  TT   (X     ,   +   X     ,  ) 

r    2    r-1     r+1 


and  when  adjacent  values   x   and   x   ,   are  missing  they  will 
be  replaced  by 

x'=x   ,  +  ■=■  (x  ,  -  -  x   i)  =  t  (2x   ,  +  x  ,-) 

r    r-1    3    r+2    r-1     3    r-1     r+2 


2  1 

X'-|  =  X   ,  +  ■=■  (x  ,-  =  X   ,)  =  -sr  (X   ,  +  2x  ,-)   . 

r+1     r-1    3    r+2    r-1     3    r-1     r+2 


The  general  formula  for   k   successive  missing  points  is 

x1  ,   =  x   ,  +  f-__  (x  ,,,  -  x   ,  )    for   j  =  0,  .  .  .  ,k-l 
r+j     r-1   k+1    r+k    r-1         J 


There  is  a  serious  question,  however,  if  an  analysis  of 
successive  differences  is  improved  by  replacement  of  more  than 
two  successive  missing  values.   It  would  appear  more  reasonable, 
at  least  on  examination  of  the  fourth  order  successive  differ- 
ences which  involve  only  sequences  of  five  observations,  to 
restart  calculation  of  successive  differences  at  the  first 
observation  after  a  sequence  of  more  than  two  missing  observatior 

The  situation  involving  a  missing  point  with  linear 
polynomial  and  noise  components  only  is  presented  in  Tables  4.1 
and  the  accompanying  definitions  for  the  modified  noise 
components  with  their  variances.   Reduced  thresholds  could 
be  used  as  indicated  in  Table  4.2  and  Figure  4.1.   These  reduced 
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TABLE  4.1 


Linear  Case:   Missing  Point   (x.)   Averaged 
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7 

a0 

+ 

7a,  + 

n7 

al 

+  n18 

n27 

8 

a8 

+■ 

Sa^^  + 

nl 
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TABLE    4.1    Continued 


x      +    x5 

x!  =  -S —  =  ao  +  4ai  + 


n,*,-    =   n 


*      — 


n5    -    n3 


15  14 


n23   =   1    (n5    ■    3n3    +    2n2}     ' 


n2%    -    0 


n3   +   n5 
2 

x4 

a2 
2 

«n2, 

n14 

a2 
2 

n*3 

=  ^°2 

n25    =   I    (2n6    "    3n5    +   n3}     '  Gn*      =   I  ^ 

25 

n33    =   I(n5    "    5n3    +    6n2    +    2nl}     '  aN*3    =   ¥  ^ 

n34    =   "n23    ' 

a35         n25 

n*      =   ■*■    (2n_    -    6n,    +    5nc    -    n0)     ,  G    *      =   ~t  a 

36         2  7  6  5  3  n*  2 

36 

n4*2    =   I    (n5    "    7n3    +    12n2    "    8nl    +   n0}  '       Jn|2    =   ^  ^ 

2 
n?-,  =  -n_  +  4n,  -  4n„  +  n,  ,  a  .   =  34a 

43  5      3      2     1  n*- 

2         2 
n*=nr-2nr.-n_+n-,  a     =  la 

44  6      o     3     2  n* 

2         2 

n4S  '  n7  "  4n6  +  4n5  "  n3  '  an|5  =  34a 

n}6  =  i  (2n8  -  8n7  +  12ng  -  7n5  +  n3)  ,  a2   =  «i  a2 

4  o 

2       2 
Jn*   <  an      for  all   j,  i. 

ji     ji 
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TABLE  4.2 


Linear  Case:   Detection  Thresholds  for  Missing  Point  Datum  at   r 

D*   =  3a    ,      Table  Values  for   3a.. /a 
li      n..  Di 


t. 

1 

D   /a 
n 

(D^-aJ/o 

D*./a 

D3i/a 

D*./a 

r-4 

3 

4.24 

r-3 

3 

7.35 

25.1 

4.24 

13.4 

r-2 

3 

7.35 

_  .**  ■*" 

24.  3 

4.24 

-*• 

12.2 

r-1 

3 

-  *^ 

5.51 

17.5 

*«*  -** 

2.10 

5.51 

r 

2.1 

0 

8.0 

*■*. 

2.10 

5.51 

r+1 

3 

,^**  "^...^ 

5.51 

17.5 

4.24 

12.2 

r+2 

3 

7.35 

«■ 

24.3 

4.24 

13.4 

^^-.^ 

r+3 

3 

4.24 

7.35 

25.1 

r+4 

3 

45 


]1 


30a 


25a 

20a 

15a 

LOa 

5a 


/ 


V 

\  / 

\  / 


D* 

4i 


D*. 
3i 


D!i 


j 

•   5a 

•LOa 
■L5a 


r-3        r-2        r-1 


r+1        r+2        r+3        r+4 


*  «.—     —     —«. 

/  \ 


«..--.-     K_     . 


■D* 

2i 


■D* 
3i 


•20a 
•25a 


•D* 

4i 


FIGURE    4.1 
Thresholds    in   Vicinity  of   a   Missing   Point 
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thresholds  could  be  useful  in  identifying  situations  involving 
equal  disturbances  separated  by  one  observation  where  that 
observation  is  labeled  as  an  outlier  and  replaced  by  the 
average  of  the  two  observations  with  disturbances.   Recalculation 
of  the  fourth  order  differences  produces  the  disturbance 
components  given  in  the  last  column  of  Table  3.5  which  are 
shown  with  the  modified  thresholds  in  Figure  4.2.   (This 
situation  is  the  same  as  for  two  disturbances  separated  by  a 
missing  point.)   Persistence  of  a  threshold  crossing  at   t 

after  deletion  and  replacement  of  the  observation   x   can 

r 

be  an  indication  that  disturbances  may  be  present  in 


x   .,   and   x   ,   instead  of,  or  in  addition  to,  a  disturbance 
r-1        r+1 

in   x  . 
r 

Some  additional  work  is  required  here  to  assist  in 
developing  that  portion  of  the  data  smoothing  program  dealing 
with  successive  differences.   It  is  fairly  clear  that  the 
existence  of  a  threshold  crossing  requires  more  effort  to 
determine  whether  it  indicates  an  isolated  outlier  or  a  more 
complicated  situation.   A  situation  with  two  adjacent  missing 
observations  and  no  disturbances  is  displayed  in  Table  4.3 
accompanied  by  the  expressions  for  the  noise  components  in 
terms  of  the  observational  noise.   The  variances  for  the  noise 
components  presented  there  provide  the  basis  for  the  thresholds 
shown  in  Table  4.4.   The  thresholds  for  the  isolated  missing 
point  situation  are  also  shown  in  Table  4.4.   Note  that  the 
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10a 


oa 
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-25a 


d  =   5a 


D* 

4i 
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4i 


FIGURE  4 . 2 
Two  Disturbances  Separated  by  a  Missing  Point  Averaged 
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TABLE    4 . 3 


Linear   Case:      Adjacent   Missing   Points    Averaged 


t . 

1 


x 


li 


2i 


3i  4i 


a0    +   n0 


aQ    +      a±   +   nx 


aQ    +   2ax    +   n2 


aQ    +    3ax    +   n3 


an    +   6a,    +   n, 
u  lb 


aQ    +    7ax    +   n7 


aQ    +   8ax    +   ng 


aQ    +   9ax    +  n9 


al    +  nll 


al    +   n12 


al    +   n13 


al    +    n14 


al    +   n15 


al    +    n16 


al    +   nl7 


al    +   nli 


al    +   n19 


n 


n 


n 


21 


22 


23 


n2*4    "    ° 


n*5    =    0 


n 


26 


'27 


n 


28 


n 


n 


n 


32 


33 


34 


n35    "    ° 


n 


n 


n 


36 


37 


38 


n 


42 


L43 


'44 


n 


45 


•46 


n 


47 
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TABLE    4 . 3    Continued 


D4   =   X3  +  3    (X6   *   X3}    =   a0  +  4al   +  n4'         nA   =  I  n6   +  7  a3»        °l*  =  f  ^ 

4 


°*5 


2  2  12  5      2 

,   +  —   (x,    -   x_)    =   aA  +  5a..    +  n*,         n*  =  —  n     +  —  n    ,        a        =  —  a 


k3        3   s   6 


n* 


n* 
14 


I  (n6  "  n3}' 


n*     =  —  (3n_  =   4n,)    +  n  J  ,        a    .      =  a    .      =   2  —  a 


26        3    w   7 


23  26 


n* 
23 


n* 
24 


8      2 


3   (n6   ~  4n3  +   3n2}'        n26   =   3   °n7  "  4n6  +  n3}  '      °n*     =  °n*     =   2   9  c 


23 


n?5   =  0 


n* 
33 


—   (n     -    7n     +  9n2   -    3n    ) ,      n* 


3   (3n8-9n7+7n6-n3)» 


'33  37 


i  *       =    — n  * 


34 


'23'         35 


n*     =  0,      n 


*     =  n* 
36  26 


n*      =  -  (n,    -   10n.   +  18n„   -   12n_   +  3nn) 
42         3        6  3  2  2  0 


n*      =  -   (3n_    -   12n.   +  18ru   -    lOn.    +  nj 
47        3  9  8  /  o  j 


2  2  ,,    2      2 

S    a    ,      =  a  =   6<+  —  a 


n*      =  -   (-2n,   +  lln     -   12n0  +  3n_ ) 

43        3  6  3  2  1 


n*      =  -r  (3n0   -   12n_,   +   1/n,      -    2n_) 
46        3  8  7  6  3 


2  2  ,_    8      2 

n*„  n*,  9 
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v, 


l44  n34'      n45  36 
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TABLE    4 . 4 


THRESHOLDS    FOR    NOISE    IN    ONE    AND    TWO   MISSING    POINT    SITUATIONS 


k      such    that      D*.    =    3a  =   ka 

ii  n  .  . 

J  31 


r-4 

r-3 


r-1 


r+1 

r+2 

r+3 
r+4 
r+5 


7.35 


5.61 


2.10 


2.10 


2.10 


24 


5.61 


7.35 


13.4 


12.2 


5.61 


5.61 


12.2 


13.4 


25.1 


24.3 


17.5 


8.0 


17.5 


24.3 


25.1 


7.35 


4.24 


13.4 


11.8 


y" 


5.1 


2.24 


?     1L 


1.41 


1.41 


1.41 


4.24 


5.1 


7.35 


5.1 


11.8 


13.4 


!5.1 


24.0 


16.7 


5.1 


5.1 


16.7 


24.0 


25.1 
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thresholds  in  the  two  missing  points  situation  are  smaller 
than  the  corresponding  ones  in  a  situation  with  a  single 
missing  point. 

A  situation  in  which  a  disturbance  occurs  in  an  obser- 
vation adjacent  to  a  missing  point  is  presented  in  Table  4.5 
(It  is  suspected  that  in  situations  involving  one  or  more  miss- 
ing points,  could  also  involve  disturbances  immediately  preceding 
of  following  a  missing  point  due  to  deteriorization  of  physical 
conditions.)   The  disturbance  components  are  shown  in  relation- 
ship to  the  common  thresholds  appropriate  when  there  are  no 
missing  points  in  Figure  4.3  and  to  the  reduced  thresholds  in 
Figures  4.4,  4.5  and  4.6.   It  can  be  seen  that  the  use  of  the 
modified  thresholds  can  increase  the  potential  crossing  of 
thresholds  in  the  vicinity  of  a  missing  point  substantially. 

Examination  of  the  effects  of  missing  points  on  the 
ability  of  successive  differences  to  indicate  the  presence  of 
disturbances  is  not  complete.   For  example,  situations  with 
disturbances  preceding  and/or  following  adjacent  missing 
points  have  not  been  examined.   Nevertheless,  some  indications 
of  the  consideration  of  missing  points  in  the  use  of  successive 
differences  to  screen  3-D  data  for  outliers  can  be  suggested 
at  this  point  in  the  development.   Under  the  guiding  principle 
of  keeping  the  data  smoothing  program  as  short  and  simple  as 
possible,  and  with  the  understanding  that  a  further  screening 
for  outliers  could  be  included  in  the  curve  fitting  portion  of 
the  program,  the  following  steps  appear  reasonable: 
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TABLE    4 . 5 


Linear   Case:       Disturbance    Following   Missing   Point 


t. 

1 

X. 

l 

Dli 

D2i 

D3i 

D.  . 
4i 

0 

ao 

+ 

no 

al  +  nil 

1 

ao 

+ 

al  +  nl 

al  +  U12 

n21 

n32 

2 

ao 

+ 

2al  +  "2 

ai  +  ni3 

n22 

n*  +  - 
33    2 

*    d 
Hi   +  2 

3 

ao 

+ 

3a,  +  n 

al  +  ni4  + 

d 

2 

n*  +  - 
23    2 

*    d 
n34  =  2 

nj3  -  d 

4 

*4* 

al  +  ni5  + 

d 
2 

n*4  =0 

n35  ~  I  d 

n£4  -  d 

5 

ao 

+ 

5a  +  n _  +  d 

al  +  n16  " 

d 

n*5  -  |  d 

n?6  +  f  d 

n*  +  4d 
4D 

6 

ao 

+ 

6a..  +  n, 
1    b 

al  +  ni7 

n26  +  d 

n37  "  d 

46    2 

7 

ao 

+ 

7al  +  n7 

ai  +  nl8 

n27 

n38 

n47  +  d 

8 

ao 

+ 

8al  +  n8 

al  +  nl9 

n28 

9 

ao 

+ 

9ax  +  n9 

x*  =   2   (x3  +  x5)    =  aQ  +  4ax  +  n*  +  d/2,        n*  =  -|   (n     +  n5) 
(For      a*. 's    see   Missing  Point   Table,    Table   4.1.) 
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Linear  Case: 


Second  Order  Differences  vs  Thresholds 
Disturbance  Following  Missing  Point. 
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Linear  Case:   Third  Order  Differences  vs  Thresholds 
Disturbance  Following  Missing  Point 


Linear  Case 


25a 
20a 
15a 
10a 

0 
-  5a 

-10a 

-15a 

-20a 

-25a 


Fourth  Order  Differences  vs  Thresholds 
Disturbance  Following  Missing  Point 


FIGURE  4.6 
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(1)  Supply  missing  points  using  the  averaging  method. 

(2)  Screen  for  outliers  using  the  fourth  order  differences 

D.  .   and  the  common  threshold   D* . 
4i  4 

(3)  Replace  any  outliers  found  by  the  averaging  method. 

(4)  Screen  for  outliers  in  the  vicinity  of  any  values 

replaced  in  Step  3  (not  those  in  Step  1)  using  the 

reduced  thresholds   D* .   for  the   D. . '  s. 

4i  4i 

(5)  Any  outliers  found  in  Step  4  should  be  referred  for  manual 
examination,  at  least  until  further  development  can  provide 
satisfactory  provisions  for  inclusion  in  the  smoothing 
program. 


G.   Noise  Variance 

In  Section  2.D,   it  was  assumed  that  the  noise  components 

of  the  data  were  normally  and  independently  distributed  with 

2 
zero  means  and  common  variance   a  .   This  variance,  or  more 

specifically  the  standard  deviation   a,  must  be  known  before 

the  thresholds  discussed  in  Sections   2.D,  E,  and  F  can  be 

specified.   Selection  of  an  appropriate  value  for   a  requires 

more  detailed  examination.  Three  potential  sources  of  values 

for  a     will  be  considered  here. 

In  Reference  1,  which  incidentally  used  path  segments 

from  the  same  set  of  data  to  be  used  in  this  study,  sample 

standard  deviations  of  magnitudes   S   =  2  or  3  were  calculated 

^  x 
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for  some  path  segments.   Sample  standard  deviations  provide 
the  primary  sources  of  information  on  the  value  of  a   and  hence 
are  of  considerable  interest  in  setting  threshold  values.   They 
can  be,  unfortunately,  contaminated  by  the  polynomial  components 
in  the  observations  as  was  demonstrated  in  the  reference. 
Nevertheless,  a  value  of  the  order  of   a  =  3   or  a  =  4   is 
an  approximation  which  could  be  used  in  setting  thresholds  for 
screening  for  outliers.   Experience  with  larger  samples  includir 
other  runs  will  provide  a  more  reasonable  basis  for  estimating 

It  is  to  be  expected  that  there  will  be  spatial  and 
temporal  variations  in   a.   Spatial  variables  can  be  present 
because  of  the  geometry  of  the  vehicle-sensor  orientation.   Dat^ 
from  which  the  value  of  a      and  its  spatial  variations  should 
be  available  from  previous  and  continuing  calibration  data 
collected  on  the  position  location  system.   Information  on  tempc 
variations  should  be  available  from  the  same  source  and  should 
also  be  monitored  during  the  collection  of  any  data  for  which 
data  smoothing  is  to  be  performed.   It  should  also  be  expected 
that  there  will  be  interaction  between  spatial  and  temporal 
variation  in  a  ,  i.e.,  that  the  temporal  variation  can  be 
different  for  different  locations  on  the  path  of  the  vehicle 
being  tracked.   This  would  imply  that  the  thresholds  to  be  used 
for  indicating  outliers  may,  and  probably  should,  be  changed 
depending  on  the  location  and  time  of  the  data  to  be  smoothed. 


The  third  potential  source  for  information  on   a   is 

the  data  to  be  smoothed.   A  single  estimate   S    for  a   =  a 

3  xx 

may  be   calculated    from  the   complete    set   of   data   or  estimates 
may  be    calculated   for   segments   of    the   data.      These   can   be 
expected    to   be   contaminated  by  both    the    polynomial    and  pertur- 
bation  components    in    the    data.      Reduction    in    the   polynomial 
component  contribution   could  be    obtained  by   using   successive 
differences    as    the    source    of    the   estimates.      Thus,    for  example, 
the    sample   variance    of    the    fourth   order   differences 


n 


4  i=l 


where 

n 


54    =  1      I      D4i 

1=1 


could  be  used  as  an  estimate  of 


2        2 


leading  to  the  estimate 


This  should  have  little  or  no  contamination  from  the  poly 
nomial  components  of  the  observations.   If  the  outliers  are 
reasonably  rare,  the  perturbation  contributions  should  also 
be  small  and  the  resulting  estimate  could  be  a  reasonable 
alternative.   Note  that  estimates  of   a   could  be  obtained  for 
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segments  of  the  data  and  hence  could  be  made  to  respond  to  the 
spatial  and  temporal  variations  in  a  discussed  in  the  second 
alternative . 

This  third  method  of  estimating  a      has  a  direct 
relationship  to  the  method  (Grubb's)  incorporated  in  the 
currently  used  program  for  identifying  outliers.   In  fact,  a 
Grubb's  type  of  screening  could  be  performed  with  the  sample 
variances  of  successive  differences  where  an  observation  is 
labeled  an  outlier  if  its  removal  provides  a  substantial 
reduction  in  the  sample  variance.   This  possibility  has  not 
been  explored. 


H .   Algorithm  for  Identifying  Outliers 

The  following  algorithm  is  suggested  for  identifying 
and  removal  of  gross  outliers.   Two  basic  principles  are 
considered  essential: 

(1)  The  algorithm  should  be  simple  and  short. 

(2)  A  subsequent  and  more  thorough  search  for  outliers  will 
be  incorporated  in  the  data  smoothing  program  concurrent 
with  or  following  the  curve-fitting  portion  of  the  program. 

The  steps  of  the  algorithm  are: 

1.  Calculate  values  for  missing  points  using  the  method 
of  averaging. 

2.  Calculate  the  fourth  order  differences   D„. 

4i  ' 
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3.  Identify  as  outliers  and  remove  from  the  data  any   x,   for 
which   |  D.,  |  >  25.1a.   (The  suggested  value  of   a   to  be 
used  here  is  a  number  between  3  and  4.) 

4.  Replace  any   x,   identified  as  an  outlier  in  Step  3  using 
the  averaging  method  as  in  Step  1. 

5.  Recalculate  the  fourth  order  differences  which  involve   x, 
(These  are   D4/k_2,  ...  ,  D^k+2.) 

6.  Re-examine  the  modified  fourth  order  differences  of  Step  5 
for' outliers  as  in  Step  3. 

7.  If  additional  outliers  are  found  in  Step  6,  either 
additional  steps  must  be  designed  to  locate  potential 
outliers  in  the  vicinity  of  the  observation   x,   (from 
Step  3)  or  the  problem  must  be  identified  for  manual 
treatment. 


I .   Identifying  Polynomial  Components 

In  using  successive  differences  to  indicate  the  appropriate 

degree  of  the  polynomial  component   P(t),  attention  is  directed 

to  the  sequence  of  signs  of  the  differences  of  the  same  order. 

The  reasoning  for  this  is  as  follows.   In  Section  2.D  it 

was  established  that  the  noise  component  n   of  the  i 

ji 
difference  of  order  j  is  a  linear  combination  of  the   n. 's 

(the  noise  components  of  the  observations) .   If  the   N.'s 

(the  random  variables  of  which  the   n.'s  are  realizations 

i 

and  hence  the  noise  components  of  the  observations,  x. 's) 
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have  zero  means  as  assumed  in  Section  2. A,   then  the   N..'s 

will  also  have  zero  means.   In  any  sequence  of  differences 

1  rN 
of  order  j,  the  mean  value  of  the  differences   N.  =  —  ).  ,  N. 
J  3    n  ^i=l   ji 

will  also  have  zero  mean.   In  the  absence  of  a  polynomial 
component  with  a  term   a  t    and  without  a  disturbance  com- 
ponent,  the  r   order  difference  terms   D  .  =  n  .   and  hence 

the  mean  value 

n  n 

5  =  -  I      D   .ijn, 
r   n  j   ri    n  J   n 

should  be  near  zero.   The  occurrence  of  a  sequence  of  differ- 
ences of  order  r  having  the  same  sign  will  have  a  mean  value 
with  that  same  sign  and  hence  can  be  interpreted  as  an  indi- 
cation of  the  presence  of  another  component.   Further,  a  dis- 
turbance in  the  form  of  an  isolated  disturbance  will  provide 
contributions  of  alternating  signs  to  a  sequence  of  differ- 
ences of  order   r.   Thus  the  reasonable  interpretation  of  the 
sequence  of  similar  signs  is  the  presence  of  a  polynomial  con- 
tribution  a    to  the   D  . 's. 
r  ri 

Note  that  values  of   a    which  are  small  with  respect  to 

the  noise  components  of  the   D  .  's  (i.e.,  small  in  comparison 

ri 

to   a.T  )  can  fail  to  cause  the  sequence  of   D  .  '  s   to  have  the 
N  ri 

r  -LJ- 

sign  of   a    since   a    will  no  longer  dominate  the   n   's. 
r  r  ri 

Thus  the  absence  of  a  sequence  of   D   's  of  the  same  sign  can 

ri 

not  be  taken  as  an  indication  that  the  polynomial  component  has 
degree  less  than   r.   However,  the  presence  of  a  sequence  of 
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differences  of  order   r  having  the  same  sign  should  be  con- 
sidered as  an  indication  that  the  polynomial  component  will 
be  of  degree  of  at  least   r. 

The  nature  of  the  property  to  be  used  for  identification 
of  appropriate  polynomial  degree  can,  perhaps,  best  be  illus- 
trated by  a  situation  in  which  a  polynomial  of  degree  one 
(P, (t)  =  a~  +  a, t)   is  fitted  by  the  method  of  least  squares 
to  a  set  of  data  with  a  polynomial  component  of  degree  two  (a 
parabola   P-(t))  and  a  small  noise  component.   The  situation 
might  appear  as  sketched  below. 


(fitted  path) 


P2(t)  =  (actual  path] 


The  residuals  errors   e.  =  x.  =  P  (t.)   have  sequences  of 
similar  signs  (a  sequence  of  negative  signs,  followed  by  a 
sequence  of  positive  signs,  and  ending  with  another  sequence 
of  negative  signs)  .  Fitting  a  polynomial  of  degree  two  to  the 
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same  data  should  produce  a  polynomial  very  close  to   P2(t) 

and  with  residuals  close  to  the  noise  components  and  hence 

with  signs  similar  to  the  signs  of  the  noise  components  which 

are  random. 

The  question  of  how  long  a  sequence  of   D   ' s  of  the 

i 
same  sign  is  required  to  indicate  the  presence  of  a  polynomial 

term  a   has  not  been  resolved.   For  any  N     the  probability 

i 
that   N     is  greater  than  zero  is  0.5.   The  probability  that 

i 
a  sequence  of  positive  values  for   k   such  independent  variable 

is  the  probability  that  a  positive  value  will  be  followed  by 

k-1  positive  values  is 

k-1 
P  (k  positive  values)  =  (0.5)     , 

and 

P(k    5)  =  1  -  P(k  <  s)  =  (0.5)s"2  . 

Thus 

P(k  >_  4)  =  0.125,     P(k  >_  5)  =  0.08,     P(k  _>  6)  =  0.03. 

Thus  a  sequence  of  six  or  more  successive  differences  of  the 
same  sign  would  be  unlikely  to  occur  do  to  noise  alone, 
i.£   the   no-ii,e    component*    uxixz   independent.      But  the  noise 
components  are  not  independent  and,  as  established  in  Section 
2.D   are  negatively  correlated.   The  probability   P(k  >_  5) 
is  substantially  less  than  the  value  given  above  in  the  case 
of  independence  and  it  is  suspected  that  a  sequence  of  four 
differences  of  order  k   can  be  taken  as  an  indication  that 
the  polynomial  component  is  at  least  of  degree  k. 
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The  situation  is  complicated  even  further  by  the  fact 
that,  for  example,  fourth  order  differences  involve  only  five 
consecutive  observations  but  the  contemplated  length  of  data 
segments  considered  for  curve  fitting  is  seven  or  eleven. 
It  is  conceivable  that  a  polynomial  fitted  to  the  five  points 
covered  by  a  fourth  order  successive  difference  would  be  of 
a  lower  degree  than  one  fitted  to  a  longer  sequence.   On  the 
other  hand,  if  a  polynomial  of  specified  degree  does  not  fit 
a  sample  of  given  length  very  well,  it  cannot  be  expected  to 
fit  a  sample  of  greater  length  very  well.   Thus  the  informa- 
tion obtained  is  of  a  negative  form  in  that  it  can  be  used 
to  eliminate  lower  degree  polynomials  from  further  consider- 
ation. 

There  is  a  temptation  to  apply  standard  sign  tests  or 
the  theory  of  runs  to  sequences  of  successive  differences. 
These,  however,  require  independence  of  noise  components  and 
would  involve  substantially  more  development  to  make  them 
suitable  for  incorporation.   They  could  be  useful  in  the 
curve-fitting  portion  of  the  data  smoothing  program  to  test 
whether  the  polynomial  degree  is  appropriate  by  testing 
whether  the  residual  errors  are  of  random  sign  or  whether 
sign  patterns  exist  as  illustrated  above. 
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3.   APPLICATION  OF  SUCCESSIVE  DIFFERENCES 

The  use  of  successive  differences  in  locating  outliers 
and  in  giving  indication  of  appropriate  polynomial  degree  for 
curve  fitting  will  be  illustrated  for  a  specific  set  of  3-D 
data.   This  data  was  obtained  from  a  test  in  which  a  torpedo 
was  launched  against  a  submarine  at  the  Naval  Undersea  Warfare 
Station.   The  3-D  data  involves  coordinates  recorded  at  equally 
spaced  times  with  very  few  data  points  missing.   Data  for  the 
x  and  y  coordinates  and  a  plot  containing  every  fifth  time 
is  provided  in  the  Appendix. 

Suppose,  now,  that  a  noise  standard  deviation  value 
0=4   is  appropriate  so  that  the  threshold  level  for  the 
fourth  order  differences  is   D*  =  25.1a  =  100.4.   The  first 
threshold  crossings  in  the  data  occur  at   t.  =  908,  909,  910,  9 

Table  5.1  shows  the  values  of   x. ,  y.   and  the  successive 

i  J  i 

differences  in  the  neighborhood  of  these  points.   (These  are 

reproduced  here  from  the  appendix  for  comparison  with  the 

results  of  treatment.)   The  situation  here  is  somewhat  confused 

It  does  not  conform  to  the  signature  (pattern)  for  a  single 

isolated  disturbance.   One  possibility  procedure  is  to  declare 

all  four  observations  on   x  and  on   y   as  outliers.   Instead 

of  doing  this  consider  one  point  at  a  time.   Since  the  largest 

magnitudes  of  the   D. .  * s  occur  at  time   t.  =  909,  the  corre- 

4i  l 

sponding  values  of   x.   and   y.   will  be  declared  outliers. 
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Replacing  these  values  with  the  average  of  the  values  at 

t.  =  908   and   t.  =  910   yields  the  modified  results  presented 
1  1        J 

in  Table  5.2.   All  of  the  fourth  order  successive  differences 
are  now  less  than   D|   and,  moreover,  are  less  than  the  modified 
thresholds  given  in  Table  4.2  (see  Figure  4.1). 

There  may,  and  should,  be  some  doubt  as  to  whether 

declaration  of  the  observations  at   t.  =  909   as  isolated  out- 

l 

liers  as  sufficient  treatment  for  this  situation.   As  can  be 
seen  in  Table  5.2,  the  fourth  order  deviations  at   t.  =  911 
are  quite  large  even  though  they  do  not  exceed  their  threshold. 
Further,  the  signatures  at  both   x.   and   y.   are  similar  to 
what  would  be  anticipated  for  isolated  disturbances  at   t  =  911. 
If,  for  example,  the  noise  standard  deviations  were   a=  3 
instead  of   a  =  4,  then  the   x.   and   y.   at   t  =  911   would 
both  exceed  their  thresholds  and  be  declared  outliers.   The 
results  of  this  treatment  are  shown  in  Table  5.3.   All  of  the 
large  successive  differences  have  been  reduced  substantially 
and  the  situation  now  appears  to  be  free  of  disturbances. 
(Reduced  thresholds  for  situations  involving  two  disturbances 
separated  by  a  non-disturbed  observation  are  not  available 
but  should  be  derived  so  that  the  treatment  could  be  completed.) 

As  a  peripheral  examination  of  this  situation,  the 
possibility  that  the  observations  at   t  =  910   as  the  initial 
outlier  was  examined.   Note  that  the  fourth  order  differences 
at   t  =  909   and   t  =  910   are  reasonably  close  and  could, 
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possibly  have  been  reversed  in  order  of  magnitude  by  the  noise 
components.   The  results  are  presented  in  Table  5.4.   Both   x. 
and  y.   at   t  =  909   are  now  indicated  as  outliers,  exceeding 
not  only  the  modified  thresholds  but  the  general  threshold 
D|  =  100.4.   Replacing  both  points  as  outliers  yields  the  results 
shown  in  Table  5.5.   An  interesting  outcome  should  be  noted. 
The  fourth  order  differences  for  both   x  and  y   at   t  --  910 
now  exceed  the  modified  threshold  appropriate  for  situations 


=  5.1a   =  20.4. 


involving  adjacent  missing  points,  namely,  D*  „.. 
(See  Table  4.4.)   But  the  observations  at   t  =  910   have  already 
been  modified.   This  suggests  that  the  observations  at   t  =  910 
should  not  have  been  considered  outliers  initially. 

The  situation  in  the  vicinity  of   t  =  910   in  the  data 
provides  illustration  of  several  features  of  the  use  of 
successive  differences  in  identification  of  outliers.   First, 
identification  of  outliers  by  successive  differences  can  be 
awkward  when  there  are  several  threshold  crossings  adjacent 
to  each  other.   As  can  be  seen  in  the  situation  with  threshold 
crossings  at  times   t  =  908,  909,  910,  and  911,  rejection  of 
the  observations  at   t  =  909  and  911  appear  to  be  sufficient 
to  reduce  the  ordered  differences  to  magnitudes  that  could  be 
produced  by  noise.   A  procedure  involving  rejection  of  one 
of  the  observations  at  a   time  starting  with  the  largest  one 
and  recalculating  the  successive  differences  to  be  examined 
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for  other  threshold  crossings  seems  reasonable.   If  several  of 
the  successive  differences  have  nearly  the  same  magnitudes, 
however,  this  could  lead  to  rejection  of  the  wrong  observations 
again,  as   demonstrated  by  rejecting  the  observations  at 
t  =  910  first. 

The  second  feature  of  this  example  is  an  outgrowth  of 
the  first.   An  algorithm,  and  the  subsequent  computer  program, 
which  will  provide  satisfactory  treatment  for  multiple  adjacent 
threshold  crossings  will  be  awkward  to  produce.   Nevertheless, 
merely  identifying  such  situations  and  relegating  them  for 
manual  processing  should  be  avoided  since  it  contradicts  the 
objective  of  complete  automatic  processing. 

The  third  feature  arises  when  the  first  order  differences 
are  examined.   There  appears  to  be  a  substantial  change  in 
velocity  (the   a,   term  of  the  polynomial  component)  in  both 
the   x   and   y   coordinates.   The  possibility  of  the  perturbati 
in  the  vicinity  of   t  =  909   being  due  to  a  change  in  the 
polynomial  component  instead  of,  or  in  addition  to,  disturbance 
causing  outliers  should  be  considered.   This  situation  should 
be  re-examined  when  curve-fitting  to  the  data  is  attempted. 

One  final  comment  on  this  situation!   The  analysis 

was  performed  by  consideration  of  the  fourth  order  differences 

(the   D..'s)  only.   It  appears  that  the  second  and  third 

order  differences  confirm  the  indications  of  the   D..'s  but 

4i 

add  little  of  a  supplementary  nature.   Again,  this  points  to 
the  use  of  only  one  order  of  differences  for  indication  of 
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outliers  and  the  preference  should  be  for  the  higher  order 
as  containing  the  least  contamination  by  any  polynomial 
component  in  the  observations. 

Another  example  of  a  threshold  crossing  occurs  at 
t  =  851  (Table  6.1) .   Note  that  in  this  situation  only  the 
y   coordinate  produces  a  crossing.   The  question  as  to  whether 
the  observation  at   x   should  also  be  rejected  must  be  con- 
sidered.  In  order  to  answer  this  question  it  may  be  necessary 
to  examine  the  data  collection  process  (e.g.,  the  sensors  and 
the  geometry  of  the  situation) .   The  results  of  replacing 
both  the   x   and   y   observations  at   t  =  851   are  presented 
in  Table  6.2.   Whether  the  improvement  in  the   x   coordinates 
is  worth  the  effort  is  debatable  at  this  stage. 

A  third  event  of  threshold  crossings  in  the  data  occurs 
in  the  vicinity  of   t  =  39  3.   Again,  multiple,  adjacent 
crossings  occur  but  only  in  the   y   coordinates.  (See  Table  7.1.) 
The  successive  differences  after  replacing  the  observations 
at   t  =  89  3   are  shown  in  Table  7.2  and  after  replacing  the 
observations  at  both   t  =  89  3   and   t  =  890   in  Table  7.3. 
Although  the   D..'s  are  well  below  the  general  bound 
D£  =  25.1a   for   a  =  4   or   a  =  3,  they  exceed  the  modified 
bounds  given  in  Table  4.1  for  observations  in  the  vicinity 
of  a  single  missing  point.   This  situation  has  not  been 
pursued  further.   As  in  the  two  situations  already  discussed 
(vicinities  of   t  =  851   and   t  =  909) ,  there  appears  to 
be  a  substantial  change  in  the  velocity  components   of  the 
vehicular  path  as  evidenced  by  the  values  of  the   D, . ' s . 
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The    three    situations    examined   above    are    the   only   ones 

in  which   values   of      D-.'s    exceed   the    threshold      D*   =    25.1a 

4i  4 

=  100.4  with  a  =  4.   In  all  three  situations  the  values  of 
the   D, . ' s  indicate  that  there  is  a  possibility  of  a  pertur- 
bation in  the  form  of  a  change  in  the  polynomial  component 
of  the  observations.   It  would  thus  appear  desirable  to  post- 
pone further  screening  for  outliers  until  the  curve-fitting 
portion  of  the  data  smoothing  effort.   After  such  treatment 
of  this  data  set  and,  possibly,  experience  gained  from 
examination  of  other  data  sets,  the  desirability  of  finer 
screening  for  outliers  using  successive  differences  should 
be  reassessed. 

The  final  comments  on  the  data  set  considered  here 
pertains  to  information  provided  by  successive  differences 
on  the  appropriate  degree  of  the  polynomial  to  be  used  in 
curve  fitting.   As  described  in  Section  2.1,  the  primary 
evidence  to  be  considered  here  is  the  existence  of  sequences 
of  successive  differences  of  a  given  order  having  the  same 
sign.   Naturally,  sequences  of   D, . ' s  having  the  same  sign 
occur  in  the  data  and  would  be  expected  for  a  torpedo  path 
since  a  torpedo  without  a  velocity  cannot  hope  to  intercept 
its  target.   No  attempt  to  fit  a  polynomial  of  degree  less 
than  one  is  contemplated.   The  only  occurrences  of  sequences 

of   D-.  .  '  s  or  D .  .  '  s  with  the  same  signs  and  having  length 
3i        4i 

greater  than  four  start  at   t  =  359   and   t  =  86  3.   Since 


31 


the  probability  that  a  sequence  of  similar  signs  of  length 
greater  than   S  =  4   is   P(k  _>  5)  =  (0.5)   =  0.167   (if  the 
differences  were  due  to  noise  only  and  the  noise  components 
of  the  differences  were  independent) .   The  reduced  probability 
of  this  event,  due  to  the  lack  of  independence,  suggests  that 
the  polynomials  to  fit  both  the   x  and  y  coordinates 
in  the  segments   t  =  851   to   t  =  86  7   should  be  of  degree 
at  least  three  and/  more  likely,  four.  Examination  of  the  plot 
of  the  torpedo  path  shown  in  the  appendix  indicates  that  this 
is,  indeed,  the  segment  of  the  torpedo  path  where  the  greatest 
changes  occurred. 
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4.   CONCLUSIONS  AND  RECOMMENDATIONS 

During  the  process  of  model  development  and  its  sub- 
sequent application  to  data  from  a  torpedo  path  it  should 
be  evident  that  successive  differences  provide  some  capability 
for  detection  of  outliers.   For  practical  purposes,  an  'outlier' 
can  be  defined  as  an  observation  whose  magnitude  is  unreason- 
ably large  when  only  its  polynomial  and  noise  components  are 
considered.   An  algorithm  for  using  successive  differences  to 
detect  outliers  is  presented  in  Section  2.H.  In  this  algorithm, 
attention  is  centered  on  the  fourth  order  successive  differ- 
ences (the   D.-'s)  and  successive  differences  of  lower  orders 
are  ignored  in  screening  for  outliers. 

As  a  secondary  use,  successive  differences  provide 
some  indication  of  appropriate  polynomial  degrees  for  the  curve- 
fitting  portion  of  the  data  smoothing  process.  This  information 
is  negative  in  form  with  a  substantial  sequence  of  similar 
signs  for  successive  differences  of  a  given  order  providing 
evidence  that  a  polynomial  of  degree  lower  than  that  order 
cannot  be  expected  to  provide  an  acceptable  fit  to  the  data 
which  produced  that  sequence . 

The  outline  for  the  algorithm  presented  in  Section  2.H 
requires  additional  development  before  it  can  be  incorporated 
in  a  data  smoothing  program.   The  primary  need  here  is  for  a 
more  thorough  treatment  for  situations  involving  missing 
points . 
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Since  outliers  are  to  be  identified  by  crossings  of 
threshold  values  by  successive  differences  and  since  these 
threshold  values  are  specified  in  terms  of  the  standard  deviatioi 
a  of  the  noise,  the  selection  of  an  appropriate  value  for  a 
is  fundamental  to  the  screening  process.   Potential  sources 
for  values  for  a     are  the  data  gathering  system  and  the  data 
available  from  torpedo  paths. 

The  possibility  of  modifying  the  thresholds  (conceptuall 
by  using  a  smaller  value  for  the  coefficient  of  a   in  Section  2 
to  remove  some  of  the  outliers  identified  in  the  subsequent 
curve-fitting  portion  of  the  data  smoothing  process  should 
be  examined.   Any  such  outliers  that  can  be  identified  by 
successive  differences  can  provide  substantial  reductions  in 
repetitions  of  curve-fitting  to  the  affected  data  segments. 
Further,  the  possibility  of  using  missing  points  in  selecting 
appropriate  data  segments  for  curve-fitting  will  be  facilitated 
by  early  identification  of  missing  points  caused  by  elimination 
of  outliers.   This  use  will  be  discussed  in  a  subsequent 
report. 
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APPENDIX  A 

DATA  FROM  A  TORPEDO  PATH  AT  NUWES 

The  model  developed  in  this  report  was  applied  to 
data  collected  on  a  specific  test  in  which  a  torpedo  was 
launched  against  a  submarine  at  the  Naval  Undersea  Warfare 
Engineering  Station.   A  major  part  of  the  torpedo  path  is 
sketched  in  the  accompanying  figure  and  the  data  is  listed 
in  the  table  which  follows.   Only  the   x   and  y   coordinates 
are  included. 
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86 


i 


n 

I 


X  . 

1 


li 


D2i 


3i 


vi 


li 


2i 


3i 


4i 


79  7 


798 


799 


800 


801 


802 


80  3 


804 


805 


806 


80  7 


26565.3 


26522.5 


264  72.5 


26385.2 


26297.6 


26212.3 


26132.5 


26039.6 


25950.8 


25863.3 


25764.6 


-42. 


-50.0 


-87.3 


-87.6 


-85.3 


-79.8 


-92.9 


-88.8 


-87.5 


-98.7 


-87.8 


21.0 


-    7.2 


-37.3 


-   0.3 


2.3 


5.5 


-13.1 


4.1 


1.3 


-11. 


10.9 


-28.2 


-30.1 


37.0 


2.6 


3.2 


-18.6 


17.2 


-   2.8 


-12.5 


22.1 


-    3.7 


-78.6 


-   1.9 


67.1 


-34.4 


0.6 


-21. 


35.8 


-20.0 


-   9.7 


34.6 


-25.8 


-3802.3 


-3783.2 


-3759.1 


-3721.2 


-3715.3 


-3750.8 


-3790.0 


-3815.5 


-3833.0 


-3833.8 


-3813.2 


19.1 


24.1 


37.9 


5.9 


-35.5 


-39.2 


-25.5 


-17.5 


-   0.8 


20.6 


26.6 


-    7.7 


5.0 


13.8 


-32.0 


-41. 


-    3.7 


13.7 


8.0 


16.7 


21.4 


6.0 


12.7 


-45.8 


-   9.4 


37.7 
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